Researchers Develop Vision-Language Foundation Model for Enhancing Medical Diagnostics----Chinese Academy of Sciences

Home / Newsroom / Research News / Information Tech

Researchers Develop Vision-Language Foundation Model for Enhancing Medical Diagnostics

Sep 30, 2024

A research team led by Prof. WANG Shanshan at the Shenzhen Institute of Advanced Technology (SIAT) of the Chinese Academy of Sciences, together with collaborators, has developed a chest X-ray vision-language foundation model, MaCo, reducing the dependency on annotations while improving both clinical efficiency and diagnostic accuracy. The study was published in Nature Communications.

The rapid evolution of machine learning has driven notable advancements in automated diagnostic systems (ADS), boosting their performance in critical tasks like disease detection and lesion quantification. However, current methods, often relying on task-specific models, require significant computational resources and large amounts of labeled data. This heavy reliance on extensive annotations has impeded the widespread adoption of ADS in medical applications.

To address this problem, researchers integrated expert medical knowledge while harnessing the advantages of pretext tasks and contrastive learning within the proposed model, MaCo. They also introduced a novel correlation weighting mechanism to enhance the effectiveness of masked contrastive learning by prioritizing the importance of masked regions. This strategy enabled MaCo to significantly improve diagnostic accuracy while reducing its dependence on large annotated datasets. Impressively, it retained a degree of anomaly recognition and localization capabilities even without annotations.

Researchers employed six well-known open-source X-ray datasets to perform a range of label-efficient fine-tuning tasks, such as classification, segmentation, and detection, comprehensively evaluate the effectiveness of MaCo.

Experimental results showed that MaCo outperformed more than 10 state-of-the-art methods in tasks using varying levels of annotation. The outstanding performance of MaCo in zero-shot learning tasks underscores its potential to significantly reduce annotation costs in medical applications. This showcases its ability to provide enhanced diagnostic performance while significantly reducing the need for extensive manual annotations.

"Our model addresses the challenge of limited annotations by reducing the burden of manual labeling while maintaining high diagnostic accuracy. We believe that MaCo sets a new benchmark for foundational models in the field of medical AI," said Prof. WANG.

Contact

LU Qun

Shenzhen Institute of Advanced Technology

E-mail: qun.lu@siat.ac.cn

Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning

About Us

Research

MEMBERS

Newsroom

International

Join Us

Newsroom

Researchers Develop Vision-Language Foundation Model for Enhancing Medical Diagnostics

Contact

Related Articles

Research

CAS Affiliation

Scientists

International

Publications

Contact Us